This paper introduces Lumina-T2X, a family of flow-based large diffusion transformers designed to transform noise into images, videos, 3D objects and audio conditioned on text. Key techniques like tokenized representations, learnable placeholders, RoPE, RMSNorm and flow matching enable unified training and flexible generation across modalities and resolutions. Models sc...